代码混合的文本数据包括带有来自多种语言的单词或短语的句子。全世界大多数多种语言社区都使用多种语言进行交流,而英语通常是其中之一。Hinglish是由印地语和英语组成的代码混合文本,但用罗马脚本编写。本文旨在确定影响系统生成的代码混合文本数据质量的因素。对于Hinglisheval任务,提出的模型使用多语言BERT来找到合成生成和人类生成的句子之间的相似性,以预测合成生成的hinglish句子的质量。
translated by 谷歌翻译
In this paper, we present an evolved version of the Situational Graphs, which jointly models in a single optimizable factor graph, a SLAM graph, as a set of robot keyframes, containing its associated measurements and robot poses, and a 3D scene graph, as a high-level representation of the environment that encodes its different geometric elements with semantic attributes and the relational information between those elements. Our proposed S-Graphs+ is a novel four-layered factor graph that includes: (1) a keyframes layer with robot pose estimates, (2) a walls layer representing wall surfaces, (3) a rooms layer encompassing sets of wall planes, and (4) a floors layer gathering the rooms within a given floor level. The above graph is optimized in real-time to obtain a robust and accurate estimate of the robot's pose and its map, simultaneously constructing and leveraging the high-level information of the environment. To extract such high-level information, we present novel room and floor segmentation algorithms utilizing the mapped wall planes and free-space clusters. We tested S-Graphs+ on multiple datasets including, simulations of distinct indoor environments, on real datasets captured over several construction sites and office environments, and on a real public dataset of indoor office environments. S-Graphs+ outperforms relevant baselines in the majority of the datasets while extending the robot situational awareness by a four-layered scene model. Moreover, we make the algorithm available as a docker file.
translated by 谷歌翻译
We present, Naamapadam, the largest publicly available Named Entity Recognition (NER) dataset for the 11 major Indian languages from two language families. In each language, it contains more than 400k sentences annotated with a total of at least 100k entities from three standard entity categories (Person, Location and Organization) for 9 out of the 11 languages. The training dataset has been automatically created from the Samanantar parallel corpus by projecting automatically tagged entities from an English sentence to the corresponding Indian language sentence. We also create manually annotated testsets for 8 languages containing approximately 1000 sentences per language. We demonstrate the utility of the obtained dataset on existing testsets and the Naamapadam-test data for 8 Indic languages. We also release IndicNER, a multilingual mBERT model fine-tuned on the Naamapadam training set. IndicNER achieves the best F1 on the Naamapadam-test set compared to an mBERT model fine-tuned on existing datasets. IndicNER achieves an F1 score of more than 80 for 7 out of 11 Indic languages. The dataset and models are available under open-source licenses at https://ai4bharat.iitm.ac.in/naamapadam.
translated by 谷歌翻译
Efficient localization plays a vital role in many modern applications of Unmanned Ground Vehicles (UGV) and Unmanned aerial vehicles (UAVs), which would contribute to improved control, safety, power economy, etc. The ubiquitous 5G NR (New Radio) cellular network will provide new opportunities for enhancing localization of UAVs and UGVs. In this paper, we review the radio frequency (RF) based approaches for localization. We review the RF features that can be utilized for localization and investigate the current methods suitable for Unmanned vehicles under two general categories: range-based and fingerprinting. The existing state-of-the-art literature on RF-based localization for both UAVs and UGVs is examined, and the envisioned 5G NR for localization enhancement, and the future research direction are explored.
translated by 谷歌翻译
Abusive language is a concerning problem in online social media. Past research on detecting abusive language covers different platforms, languages, demographies, etc. However, models trained using these datasets do not perform well in cross-domain evaluation settings. To overcome this, a common strategy is to use a few samples from the target domain to train models to get better performance in that domain (cross-domain few-shot training). However, this might cause the models to overfit the artefacts of those samples. A compelling solution could be to guide the models toward rationales, i.e., spans of text that justify the text's label. This method has been found to improve model performance in the in-domain setting across various NLP tasks. In this paper, we propose RAFT (Rationale Adaptor for Few-shoT classification) for abusive language detection. We first build a multitask learning setup to jointly learn rationales, targets, and labels, and find a significant improvement of 6% macro F1 on the rationale detection task over training solely rationale classifiers. We introduce two rationale-integrated BERT-based architectures (the RAFT models) and evaluate our systems over five different abusive language datasets, finding that in the few-shot classification setting, RAFT-based models outperform baseline models by about 7% in macro F1 scores and perform competitively to models finetuned on other source domains. Furthermore, RAFT-based models outperform LIME/SHAP-based approaches in terms of plausibility and are close in performance in terms of faithfulness.
translated by 谷歌翻译
建筑行业的机器人可以使用高精度数据捕获来通过不断监视工作进度来降低成本。准确的数据捕获需要在环境中精确的移动机器人定位。在本文中,我们介绍了有关机器人本地化的新颖作品,该工作以墙壁和房间的形式提取了从建筑计划中提取几何,语义以及拓扑信息,并创建了情境图的拓扑和度量语言层(S-图)在环境中导航之前。当机器人在施工环境中导航时,它使用机器人的探光仪和从3D LIDAR测量中提取的平面壁的形式的感觉观测来估算其依靠粒子过滤器方法的姿势,并利用先前构建的情境图和它可用的几何,语义和拓扑信息。我们在将其与基于传统几何的本地化技术进行比较时,在实际持续的施工站点上捕获的模拟和真实数据集中验证了我们的方法。
translated by 谷歌翻译
社区检测是网络科学中的经典问题,在各个领域都有广泛的应用。最常用的方法是设计算法,旨在最大程度地跨越网络分配到社区中的不同方式,以最大化效用函数,模块化。尽管它们的名称和设计理念,但当前的模块化最大化算法通常无法最大化模块化或保证与最佳解决方案的任何接近。我们提出了Bayan算法,该算法与现有方法不同,该算法返回网络分区,以确保最佳或靠近最佳解决方案。 Bayan算法的核心是一种分支和切割方案,该方案解决了模块化最大化问题的稀疏整数编程公式,以最佳或在一个因素内近似它。我们使用合成和真实网络分析了Bayan对22种现有算法的性能。通过广泛的实验,我们不仅在最大化模块化方面展示了Bayan的独特能力,而且更重要的是在准确检索地面真实群落方面。 Bayan的比较性能水平在数据(图)生成过程中噪声量的变化上保持稳定。 Bayan作为确切的模块化最大化算法的性能也揭示了在社区准确检索中最大模块化分区的理论能力限制。总体而言,我们的分析指出,通过精确(近似)最大化的网络中的模块化(近似$ \ sim10^3 $边缘(和较大的网络)),BAYAN是对社区进行方法基础检测的合适选择。图形优化和整数编程的前瞻性进步可以进一步推动这些限制。
translated by 谷歌翻译
移动机器人应该意识到他们的情况,包括对周围环境的深刻理解,以及对自己的状态的估计,成功地做出智能决策并在真实环境中自动执行任务。 3D场景图是一个新兴的研究领域,建议在包含几何,语义和关系/拓扑维度的联合模型中表示环境。尽管3D场景图已经与SLAM技术相结合,以提供机器人的情境理解,但仍需要进一步的研究才能有效地部署它们在板载移动机器人。为此,我们在本文中介绍了一个小说,实时的在线构建情境图(S-Graph),该图在单个优化图中结合在一起,环境的表示与上述三个维度以及机器人姿势一起。我们的方法利用了从3D激光扫描提取的轨道读数和平面表面,以实时构造和优化三层S图,其中包括(1)机器人跟踪层,其中机器人姿势已注册,(2)衡量标准。语义层具有诸如平面壁和(3)我们的新颖拓扑层之类的特征,从而使用高级特征(例如走廊和房间)来限制平面墙。我们的建议不仅证明了机器人姿势估计的最新结果,而且还以度量的环境模型做出了贡献
translated by 谷歌翻译
最近的开放式域问题的作品应答使用检索器模型引用外部知识库,可选地重新映射与单独的重新编制模型,并使用另一个读取器模型生成答案。尽管执行相关任务,但模型具有单独的参数,并且在训练期间略微耦合。在这项工作中,我们建议将猎犬和重新划分为依次应用于变压器架构内的硬注视机制,并将所产生的计算表示给读者送入。在这个奇异模型架构中,隐藏的表示从搬运者逐渐改进到Reranker到读者,这更有效地利用模型容量,并且当我们以端到端的方式训练时,还导致更好的梯度流动。我们还提出了一种预先训练的方法,以有效地培训这种架构。我们评估我们的自然问题和TriviaQA Open DataSets的模型以及固定参数预算,我们的模型优于以前的最先进模型1.0和0.7精确匹配分数。
translated by 谷歌翻译
近年来,空中机器人背景下的高速导航和环境互动已成为几个学术和工业研究研究的兴趣领域。特别是,由于其若干环境中的潜在可用性,因此搜索和拦截(SAI)应用程序造成引人注目的研究区域。尽管如此,SAI任务涉及有关感官权重,板载计算资源,致动设计和感知和控制算法的具有挑战性的发展。在这项工作中,已经提出了一种用于高速对象抓握的全自动空中机器人。作为一个额外的子任务,我们的系统能够自主地刺穿位于靠近表面的杆中的气球。我们的第一款贡献是在致动和感觉水平的致动和感觉水平的空中机器人的设计,包括具有额外传感器的新型夹具设计,使机器人能够高速抓住物体。第二种贡献是一种完整的软件框架,包括感知,状态估计,运动计划,运动控制和任务控制,以便快速且强大地执行自主掌握任务。我们的方法已在一个具有挑战性的国际竞争中验证,并显示出突出的结果,能够在室外环境中以6米/分来自动搜索,遵循和掌握移动物体
translated by 谷歌翻译